Goto

Collaborating Authors

 historical record


The Download: how the World Cup ball will fly and OpenAI's "super app"

MIT Technology Review

The Download: how the World Cup ball will fly and OpenAI's "super app" Plus: OpenAI plans to turn ChatGPT into a'super app' before its IPO. Why this year's World Cup ball may not fly as far Much is new about this month's FIFA World Cup tournament. It hosts more teams than ever before. It's the first to occur in three different host countries. And, like every World Cup for over half a century, it will employ a football with a brand-new design. Through wind-tunnel experiments, researchers found that long-distance kicks with Adidas's new Trionda ball might not travel as far as they did in the past.




Oldest known dog breed reveals hidden human history

Popular Science

Breakthroughs, discoveries, and DIY tips sent every weekday. The Iditarod is the longest annual sled dog raceโ€“ covering over 1,500 miles across Alaska. A close look into canine genetics reveals sled dogs have been around and on the move for thousands of years. Specifically, the Greenland sled dogโ€“called Qimmeq (singular), or Qimmit (plural) in Greenlandicโ€“has a history traceable all the way back 9,500 years to Zhokhov Island in Eastern Siberia. And they've been a distinct, isolated group for about 1,000 years of that time.


Open-Set Living Need Prediction with Large Language Models

arXiv.org Artificial Intelligence

Living needs are the needs people generate in their daily lives for survival and well-being. On life service platforms like Meituan, user purchases are driven by living needs, making accurate living need predictions crucial for personalized service recommendations. Traditional approaches treat this prediction as a closed-set classification problem, severely limiting their ability to capture the diversity and complexity of living needs. In this work, we redefine living need prediction as an open-set classification problem and propose PIGEON, a novel system leveraging large language models (LLMs) for unrestricted need prediction. PIGEON first employs a behavior-aware record retriever to help LLMs understand user preferences, then incorporates Maslow's hierarchy of needs to align predictions with human living needs. For evaluation and application, we design a recall module based on a fine-tuned text embedding model that links flexible need descriptions to appropriate life services. Extensive experiments on real-world datasets demonstrate that PIGEON significantly outperforms closed-set approaches on need-based life service recall by an average of 19.37%. Human evaluation validates the reasonableness and specificity of our predictions. Additionally, we employ instruction tuning to enable smaller LLMs to achieve competitive performance, supporting practical deployment.


MMXU: A Multi-Modal and Multi-X-ray Understanding Dataset for Disease Progression

arXiv.org Artificial Intelligence

Large vision-language models (LVLMs) have shown great promise in medical applications, particularly in visual question answering (MedVQA) and diagnosis from medical images. However, existing datasets and models often fail to consider critical aspects of medical diagnostics, such as the integration of historical records and the analysis of disease progression over time. In this paper, we introduce MMXU (Multimodal and MultiX-ray Understanding), a novel dataset for MedVQA that focuses on identifying changes in specific regions between two patient visits. Unlike previous datasets that primarily address single-image questions, MMXU enables multi-image questions, incorporating both current and historical patient data. We demonstrate the limitations of current LVLMs in identifying disease progression on MMXU-\textit{test}, even those that perform well on traditional benchmarks. To address this, we propose a MedRecord-Augmented Generation (MAG) approach, incorporating both global and regional historical records. Our experiments show that integrating historical records significantly enhances diagnostic accuracy by at least 20\%, bridging the gap between current LVLMs and human expert performance. Additionally, we fine-tune models with MAG on MMXU-\textit{dev}, which demonstrates notable improvements. We hope this work could illuminate the avenue of advancing the use of LVLMs in medical diagnostics by emphasizing the importance of historical context in interpreting medical images. Our dataset is released at \href{https://github.com/linjiemu/MMXU}{https://github.com/linjiemu/MMXU}.


Early evidence of how LLMs outperform traditional systems on OCR/HTR tasks for historical records

arXiv.org Artificial Intelligence

We explore the ability of two LLMs -- GPT-4o and Claude Sonnet 3.5 -- to transcribe historical handwritten documents in a tabular format and compare their performance to traditional OCR/HTR systems: EasyOCR, Keras, Pytesseract, and TrOCR. Considering the tabular form of the data, two types of experiments are executed: one where the images are split line by line and the other where the entire scan is used as input. Based on CER and BLEU, we demonstrate that LLMs outperform the conventional OCR/HTR methods. Moreover, we also compare the evaluated CER and BLEU scores to human evaluations to better judge the outputs of whole-scan experiments and understand influential factors for CER and BLEU. Combining judgments from all the evaluation metrics, we conclude that two-shot GPT-4o for line-by-line images and two-shot Claude Sonnet 3.5 for whole-scan images yield the transcriptions of the historical records most similar to the ground truth.


'Hold on to your seats': how much will AI affect the art of film-making?

The Guardian

Last year, Rachel Antell, an archival producer for documentary films, started noticing AI-generated images mixed in with authentic photos. There are always holes or limitations in an archive; in one case, film-makers got around a shortage of images for a barely photographed 19th-century woman by using AI to generate what looked like old photos. Which brought up the question: should they? And if they did, what sort of transparency is required? The capability and availability of generative AI โ€“ the type that can produce text, images and video โ€“ have changed so rapidly, and the conversations around it have been so fraught, that film-makers' ability to use it far outpaces any consensus on how.


Can Foundational Large Language Models Assist with Conducting Pharmaceuticals Manufacturing Investigations?

arXiv.org Artificial Intelligence

General purpose Large Language Models (LLM) such as the Generative Pretrained Transformer (GPT) and Large Language Model Meta AI (LLaMA) have attracted much attention in recent years. There is strong evidence that these models can perform remarkably well in various natural language processing tasks. However, how to leverage them to approach domain-specific use cases and drive value remains an open question. In this work, we focus on a specific use case, pharmaceutical manufacturing investigations, and propose that leveraging historical records of manufacturing incidents and deviations in an organization can be beneficial for addressing and closing new cases, or de-risking new manufacturing campaigns. Using a small but diverse dataset of real manufacturing deviations selected from different product lines, we evaluate and quantify the power of three general purpose LLMs (GPT-3.5, GPT-4, and Claude-2) in performing tasks related to the above goal. In particular, (1) the ability of LLMs in automating the process of extracting specific information such as root cause of a case from unstructured data, as well as (2) the possibility of identifying similar or related deviations by performing semantic search on the database of historical records are examined. While our results point to the high accuracy of GPT-4 and Claude-2 in the information extraction task, we discuss cases of complex interplay between the apparent reasoning and hallucination behavior of LLMs as a risk factor. Furthermore, we show that semantic search on vector embedding of deviation descriptions can be used to identify similar records, such as those with a similar type of defect, with a high level of accuracy. We discuss further improvements to enhance the accuracy of similar record identification.


The Download: tracing a mysterious covid strain, and fighting dengue with drones

MIT Technology Review

Historians have started using machine learning to examine historical documents, including astronomical tables like those produced in Venice and other early modern cities. Proponents claim that the application of modern computer science to the past helps draw connections across a broader swath of the historical record than would otherwise be possible, correcting distortions that come from analyzing history one document at a time. But it introduces distortions of its own, including the risk that machine learning will slip bias or outright falsifications into the historical record. The way sea sponges pump water is really quite amazing. I never thought I'd be transfixed by a bed making competition, but here we are.